Automatic extraction of differences between spoken and written languages, and automatic translation from the written to the spoken language

نویسندگان

  • Masaki Murata
  • Hitoshi Isahara
چکیده

We extracted the di erences between spoken language and written language from a spoken-language corpus and a writtenlanguage corpus by using the UNIX command \di " and examined the di erences to determine the construction of the grammars of the two corpora. We also transformed written-language sentences into spoken-language sentences by using rules based on the extracted di erences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic processing survey of spoken and written words in adolescents with cerebral palsy: Evidence from PALPA word-picture matching test

Objective: The present study aimed to assess and compare semantic processing of spoken and written words in adolescents with cerebral palsy and healthy adolescents. Method: The present study is quantitative in terms of type and experimental in terms of method. Examination Group consisted 30 adolescents with cerebral palsy aged 10 to 15 years were selected by convenience sampling method. All of ...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

Spoken Language Translation With The ITSVox System

This paper describes the ITSVox speech-to-speech translation prototype currently under development at LATL in collaboration with IDIAP. The ITSVox project aims at a general, interactive, multimodal translation system with the following characterics : (i) it is not restricted to a particular subdomain, (ii) it can be used either as a fully automatic system or as an interactive system, (iii I it ...

متن کامل

Robust Extraction of Subcategorization Data from Spoken Language

Subcategorization data has been crucial for various NLP tasks. Current method for automatic SCF acquisition usually proceeds in two steps: first, generate all SCF cues from a corpus using a parser, and then filter out spurious SCF cues with statistical tests. Previous studies on SCF acquisition have worked mainly with written texts; spoken corpora have received little attention. Transcripts of ...

متن کامل

Transcribing human-directed speech for spoken language processing

As storage costs drop and bandwidth increases, there has been a rapid growth of spoken information available via the web or in online archives, raising problems of document retrieval, information extraction, summarization and translation for spoken language. While there is a long tradition of research in these technologies for text, new challenges arise when moving from written to spoken langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002